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(57) ABSTRACT 

A system and method are provided for translating an input 
text from a natural source language to a natural target 
language. The system stores a database that contains a 
plurality of pairs of text fragments with each pair including 
a text fragment in the source language and a corresponding 
text fragment in the target language. Each text fragment 
contains at least one word phrase and represents a primary 
grammatical unit such as a sentence or a clause. For trans- 
lating a word phrase, the database is queried using a phrase 
index of the database, where the phrase index indexes text 
fragments by word phrases. Word phrases are noun phrases 
or word phrases. Alternatively, word phrases are predicates 
involving at least one verb and one noun or adjective used 
as a noun. The system further comprises a phrase extractor 
for extracting a word phrase from a text fragment of an input 
text. 
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WORD PHRASE TRANSLATION USING A These and other objects of the present invention will 

PHRASE INDEX become apparent hereinafter. 

To achieve these objects, according to a first aspect, the 

BACKGROUND OF THE INVENTION invention provides a method for translating a word phrase 

5 from a first natural language to a second natural language. 

1. Field of the Invention The word phrase is a group of two or more associated words. 
The invention generally relates to translating expressions The method comprises the steps of inputting a text written 

from one natural language into another natural language, in the first language; extracting a word phrase from said text; 

and in particular assisting a translator to get the right and querying a database for the extracted word phrase using 

translation for any phrase. io a phrase index of said database. The phrase index indexes 

2. Description of the Related Art text fragments by word phrases. The text fragments repre- 
Any translator is evaluated according to two criteria: sent a primary grammatical unit including at least one 

translation speed and translation quality. One difficulty clause. The database contains pairs of text fragments, with 

affecting both of these criteria is the appearance of a word eacn P 4 * including a text fragment in the first language and 

or group of words which makes the translator hesitate. 15 a corresponding text fragment in the second language. A 

Finding the suitable translation may lead to a time- translation of said extracted phrase is then obtained based on 

consuming manual search, with no guarantee of the result. one of me of lext fragments revealed during the step of 

Presently, several techniques have been developed for querying the database, 

assisting a translator. One of these techniques involves the M According to a second aspect of the present invention, 

use of contextual dictionary look-up. Contextual dictionaries ^ere k provided a computer-readable storage medium stor- 

allow for getting the translation of a word according to its m g instructions for translating a word phrase from a first 

context. This technique is strongly limited in the extent to natural language to a second natural language by performing 

which translations are possible, i.e. by looking up a contex- me steps according to the first aspect, 

tual dictionary, the translator is provided with a low number 25 According to a third aspect of the present invention, there 

of proposed translations only. is provided a system for translating an input text from a 

Further, multi-lingual terminology databases exist which natural source language to a natural target language. The 

are based on translations of pre-accepted terms. This tech- system comprises storage means for storing a database 

nique is strongly restricted to the prestored set of terms, and containing a plurality of pairs of text fragments. The text 

the translator is not assisted in translating expressions which 30 fragments represent a primary grammatical unit including at 

are not part of the set of pre-accepted terms. least one clause. Each pair includes a text fragment in the 

A further technique is based on the use of translation scmrce language and a corresponding text fragment in the 

memory which stores already translated sentences. When a tar 8 et l*«™ge. Each text fragment contains at least one 

sentence has to be translated, the system queries the database word ph rase - word phrase is a group of two or more 

and automatically proposes a translation. However, this 35 associated words. The system further comprises a phrase 

system requires matching complete sentences, even if the extractor for extracting a word phrase from a text fragment 

matching can be fuzzy, so that this technique is again of said in P u{ texl » and database retrieval means for 

strongly restricted in its applicability. retrieving, from said database, pairs of text fragments that 

Another translation technique has been proposed by M. contain me extracted word P hrase > ™*ng a phrase index of 

Nagao, "A Framework of a Mechanical Translation between 40 dalabase - ™e P hrase mdc * *xt fragments by word 

Japanese and English by Analogy Principle", Artificial and P hrases ; ^ s y stem comprises user interface 

Human Intelligence (A. Elithom and R. Banerji, eds.), means for aUowing a user to select one of said retrieved pairs 

Elsevier Science Publishers, 1984, pgs. 173-180. This tech- of ^xt fragments to obtain a translation of the extracted 

nique involves aligning and linguistically parsing sentences w phrase. 

for machine translation. The parse trees from each pair of 45 According to a fourth aspect, the invention provides a 

sentences are also aligned. One drawback of this technique method for generating a text fragment database for use in 

is that such machine translation systems require performing translating a word phrase from a first natural language into 

an overall parse of the translated sentences. Another draw- a second natural language. The word phrase is a group of 

back is that subtrees are needed to be aligned, resulting in a ^ or more associated words. The method comprises the 

considerably high computational overhead. so steps of inputting a first document containing a text written 

qttk^adv np me fM\/t?isjTirtM ^ ^ flrst ^S^S^ inputting a second document contain- 

SUMMARY OF THE INVENTION ^ said tcxt wriuen m me second language; ajig^ co r . 

The present invention has been made in consideration of responding text fragments of the first and second documents; 

the above situation and has as its primary object to assist a extracting word phrases from the text fragments of the first 

translator to achieve an improved quality of the resulting 55 document; and generating index information on the 

document. extracted word phrases and the aligned text fragments 

It is another object of the present invention to contribute holding the word phrases. The text fragments represent a 

to a controlled translation to prevent expensive manual primary grammatical unit including at least one clause. 

search for unknown expressions, thereby providing tunc- According to another aspect of the present invention, in 

tionality in addition to that of using translation memory and 60 the methods and systems according to the first to fourth 

terminology databases. aspects, the word phrases preferably are noun phrases. 

It is still another object of the present invention to provide Alternatively, the word phrases may also be verb phrases. In 

the translator with an easy-to-use, efficient and reliable tool another alternative, the word phrases may be predicates 

which is capable of promptly replying to the translator's involving at least one verb and one noun or adjective used 

request for assistance. 65 as a noun. 

A further object of the present invention is to be compat- According to still another aspect of the present invention, 

ible with existing technology and software tools. the primary grammatical units are sentences. 
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It is still another aspect of the present invention that, once 
pairs of text fragments have been retrieved from the 
database, these retrieved pairs of text fragments are pre- 
sented to the translator. Alternatively, the translator is pro- 
vided with proposed translations of the extracted word 
phrase, based on the retrieved pairs of text fragments. Id 
either case, the translator approves a translation, and the 
approved translation is then used as translation of the 
extracted word phrase. 

According to still another aspect of the invention, in the 
systems and methods according to the above aspects, the 
step of querying the database for the extracted phrase 
includes the step of querying the database for sub-phrases, 
i.e. for all word phrases partly matching the extracted 
phrase. 

Finally, the present invention according to any of the 
above aspects, may involve the step of obtaining a transla- 
tion by querying a terminology base in addition to the 
phrase-indexed text fragment database. 

By using the approach of the present invention, the 
database is phrase-indexed. Extracted word phrases directly 
index whole text fragments. In preferred embodiments, the 
noun phrases are used to index a sentence database. The 
extracted noun phrases directly index whole sentences 
thereby leaving the recognition of the corresponding sub- 
units in the translated sentences to the translator. Therefore, 
no overall parse of the translated sentences is performed and 
no alignment of subtrees is necessary. 

The invention is further advantageous in that it makes use 
of already translated material and presents to the translator, 
in the preferred embodiment, sentences containing the 
respective noun phrase both in the source and target lan- 
guage. By using a phrase-indexed sentence database, both 
translation speed and translation quality are improved. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The accompanying drawings are incorporated into and 
form a part of the specification to illustrate several embodi- 
ments of the present invention. These drawings together 
with the description serve to explain the principles of the 
invention. The drawings are only for the purpose of illus- 
trating preferred alternative examples of how the invention 
can be made and used and are not to be construed as limiting 
the invention to only the illustrated and described embodi- 
ments. Further features and advantages will become appar- 
ent from the following and more particular description of the 
various embodiments of the invention, as illustrated in the 
accompanying drawings, wherein: 

FIG. 1 illustrates a translation system according to the 
invention; 

FIG. 2 is a flow chart illustrating the process of generating 
a phrase-indexed sentence database according to the inven- 
tion; 

FIGS. 3 and 4 are flow charts illustrating the translation 
process according to the invention; 

FIG. 5 illustrates a first embodiment of a user interface 
according to the invention; 

FIG. 6 illustrates the use of sub-phrases; and 

FIG. 7 illustrates a second embodiment of a user interface 
according to the invention. 

DETAILED DESCRIPTION OF THE 
INVENTION 

According to the invention, word phrases are used for 
indexing text fragments representing a primary grammatical 
unit including at least one clause. 
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Phrases are expressions consisting usually of but a few 
words, denoting a single idea or forming a separate part of 
a sentence. Specifically, a phrase is a group of two or more 
associated words, not containing a subject and predicate. 

5 Noun phrases are phrases involving either pronouns or 
nouns. Nouns are words used as the name of a thing, quality 
of action existing or conceived by the mind. Pronouns are 
words used as a substitute for a noun. Thus, noun phrases are 
e.g. "road test", "fuel pressure test operations", or "verb 

10 phrase". By contrast, verb phrases are phrases involving one 
or more verbs such as "broadened" or "having been fitted". 

Distinguished from a phrase, a clause is a group of words 
containing a subject and predicate, that is, clauses are 
syntactic constructions forming part of a sentence or con- 

15 stituting a whole simple sentence. Sentences are grammati- 
cal units of one or more words, bearing minimal syntactical 
relation to the words that precede or follow it, i.e. compris- 
ing a minimum sense of completeness and unit. Sentences 
express a complete thought, whether a statement of fact, a 

20 question, a command, or an exclamation. 

Thus, clauses and sentences can be defined as primary 
grammatical units. Whereas sentences may comprise clauses 
and phrases, and clauses may comprise phrases, phrases 
cannot comprise clauses or sentences. The preferred 

25 embodiment of the invention makes use of noun phrases to 
index sentences. 

Referring now to the drawings and particularly to FIG. 1, 
the translation system according to a preferred embodiment 

3Q of the invention comprises a control unit 14, which may be 
a computer of any kind such as a personal computer or a 
workstation, e.g. running a conventional operating system 
such as Windows NT or UNIX. The control unit 14 runs a 
software application which may be controlled by pointer 

35 device 12 or keyboard 13, using a display U. The interface 
software may for instance be written in Visual Basic, and the 
resulting application can be an OLE server which can be 
integrated directly to any Visual Basic or C/C++ code. It will 
however be appreciated by those of ordinary skill in the art 

40 that many other kinds of implementations are also possible. 
The application program running on control unit 14 has 
access to sentence database 17 which might be an Access or 
Oracle database and which has preferably been generated 
using a UNIX workstation. The size of the database depends 

45 on the field in which the system is used. Again, it will be 
appreciated that many other implementations are possible. 

Sentence database 17 stores a plurality of sentences in 
English as the source language, and also stores to each 
English sentence the corresponding sentence in French as 

50 the target language. The sentence database 17 further 
includes a phrase index holding information to each noun 
phrase stored in any sentence of the database, indicating the 
sentences in which the respective noun phrase is to be found. 
The translation software running on control unit 14 further 

55 has access to phrase extractor 15, which extracts noun 
phrases from an input text using complex linguistic algo- 
rithms. The input text may come from any text source such 
as storage media, scanners, messages, speech recognition 
etc. 

60 Referring now to FIG. 2, which illustrates a flow chart of 
the process of generating the phrase -indexed sentence data- 
base 17, the system first obtains, in steps 21 and 22, a 
document written in the source language and a correspond- 
ing document written in the target language. Then, in step 

65 23, in the pair of source and target documents the sentences 
are aligned. Aligning sentences means establishing a link 
between each source sentence and the corresponding target 
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sentence. After the sentence have been aligned, noun phrases the original sentence. For this purpose all the noun phrases 

are extracted from each of the source sentences in step 24, in the original sentence of the input text and those in the 

and the extracted noun phrases are added to the phrase index retrieved sentence are compared and the relatedness then 

of sentence database 17 together with the respective infer- depends on the number of common noun phrases. Thus, the 

mation concerning the sentence from which the noun phrase 5 system is able to present to the user in step 43 all the pairs 

has been extracted. After the extracted noun phrases have of sentences with those sentences first which are closest to 

been added to the phrase index in step 25, a decision is made the one to be translated. An example of a displayed sorted set 

in step 26 whether another pair of source/target documents of paired sentences is depicted in FIG. 5 as field 54. 

is to be read in. If so, the process of generating the database In step 44 of FIG 4 me uscr onc of the displayed 

returns to step 21 for obtaining the documents, aligning the 10 sentences of field 54 which is then copied to fields 56 and 57 

sentences and extracting and storing the noun phrases. By to allow the user to more intensively study the proposed 

repeating steps 21 to 26 for a set of translated document translation. Once the user has decided that the selected pair 

pairs, a comprehensive database will be built storing a high 0 f translated sentences should be used for translating the 

number of encountered noun phrases with the sentences selected word of the input test, phrase extractor 15 extracts 

holding them. 15 m s t ep 45 from the selected sentence pair the noun phrase 

While the process in FIG. 2 has been described in the translation and inserts the translated noun phrase automati- 

context of generating the phrase- indexed sentence database cally into the translation of the input text. 

17, it will be appreciated by those skilled in the art that the [ n ste p 46 the translator decides whether a further noun 

illustrated process also can be used at any time to enrich an p h raS c needs to be translated. If so, the process returns to 

existing database by storing sentences and noun phrases of 20 ste p 32 and the user selects another unknown word, 

new document pairs. Further, it will be apparent to those of Tumin now to nG 6 which mustrates the operation on 

ordinary skill in the. art that the process of updating and fidd 52 accord ing to a ferred cmbod i men t, the capability 

generating the database has been described in the context of of the ^ accordj tQ ^ inveQtion tQ Q on noun 

documents but that it may likewise be performed on the ^ sub . phrascs ^ described. In case the user ^cts from field 

asis o ocumen pa . ^ . Q &iQ ^ 35 ft nouQ p jj rase w jjj cn nas no entrv m me phrase 

In another preferred embodiments of the present index of sentence database 17, the system either automati- 
invention, only those documents are used for building the cally looks f or partia i matching noun phrases, or presents to 
database which lie in one and the same field, i.e. car the user in fields 61 and 62 a list of sub -phrases from which 
maintenance documents. By this measure the number of ^ th e user mav select ^ entry f or pr0 ceeding with the trans- 
possible translations of each noun phrase is reduced so that, i a tion process. Assuming no sentence contains the noun 
if the database is large enough, most of the noun phrases will parasc "fuel pressure test operations", the user is aUowed to 
be indexed and a translation will be available for almost all choose the sub-phrase "fuel pressure test" in field 62 for 
the requests. which the phrase index of the database 17 might have an 

Turning now to FIGS. 3 and 4, which illustrate a flow 35 entry, 

chart of the translation process, the user inputs the text to be jhe translation system according to the invention has 

translated in step 31. The text is then displayed so that the been described as including a noun-phrase-indexed sentence 

user can select an unknown word or group of words, e.g. by database 17. In the preferred embodiment of FIG. 1, the 

double-clicking on the word. Once a word has been selected system further includes a terminology base 16 to which the 

in step 32, phrase extractor 15 extracts in step 33 all the 4Q translation application running on control unit 14 has access, 

possible noun phrases relating to the selected word. The set Once the user has selected, in step 35, a noun phrase to be 

of possible noun phrases is then displayed in step 34, and translated, the system queries the terminology base in step 

one of the displayed noun phrases is selected, in step 35, 3$ j D case me terminology base does include a translation 

either automatically or by user request. By default, the of the selected noun phrase, the system presents to the user 

longest of the extracted noun phrases is selected, with no 45 in field 78 of FIG. 7 the retrieved translation. The user is then 

need of user selection. This automated selection may also be allowed to either approve the translation retrieved from the 

performed depending on whether the extracted noun phrases terminology base 16 and display it in field 78 or to approve 

exist in the database. In this case, the longest (sub) noun me proposed translation retrieved from the phrase-indexed 

phrase existing in the database is selected. In any case, sentence database 17 and select it from the list displayed in 

selection may be performed or changed by the user. 5Q field 74. for further assisting the user in deciding which 

FIG. 5 shows an example of how to present the mentioned translation to be used, the translation retrieved from the 

information to the user. In a windowing operation system, terminology base 16 is highlighted in the list of field 74 at 

the control unit 14 displays a window 51 in which a number each location where it appears. 

of fields is shown. The word which has been selected by As described from the foregoing, the present invention 

double-clicking in step 32 is displayed in field 55. In field 53 5S has many advantages in that it uses a noun-phrase-indexed 

the set of possible noun phrases which has been extracted in sentence database in which noun phrases directly index 

step 33 is displayed. From this set of noun phrases the user whole sentences, leaving the recognition of the correspond- 

has selected, in the example of FIG. 5, the noun phrase "road mg sub-units in the translated sentences to the user. Thus, no 

test", which is then shown in field 52. overall parse for translated sentences is performed. 

Once the user has selected one of the possible noun 60 Nevertheless, the system according to the invention may be 

phrases, control unit 14 queries the phrase-indexed sentence integrated among other conventional tools for translating 

database 17 in step 41. Since all the sentences in the expressions, such as example-based machine translation, 

phrase-indexed sentence database 17 are directly indexed by contextual dictionary lookup, multi-lingual terminology 

noun phrases, the system is able to retrieve all pairs of databases, or translation memory. By providing a translation 

sentences indexed by the selected noun phrase without any 65 environment using phrase -indexed sentence databases and 

substantive delay. The system then sorts in step 42 the integrating existing technologies, any translator is provided 

retrieved pairs of sentences according to their relatedness to with a powerful new translation aid. 
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While the invention has been described with respect to the 
preferred physical embodiments constructed in accordance 
therewith, it will be apparent to those skilled in the art that 
various modifications, variations and improvements of the 
present invention may be made in the light of the above 
teachings and within the purview of the appended claims 
without departing from the spirit and intended scope of the 
invention. In addition, those areas in which it is believed that 
those of ordinary skill in the art are familiar, have not been 
described herein in order not to unnecessarily obscure the 
invention described therein. It is for instance useless to say 
that the user can change the selected word at any time by 
typing in a new noun phrase or by selecting another element 
of the sub-phrase list. 

Accordingly, it is to be understood that the invention is 
not to be limited by the specific illustrative embodiments, 
but only by the scope of the appended claims. 

What is claimed is: 

1. A method for translating a word phrase from a first 
natural language to a second natural language, said word 
phrase being a group of two or more associated words, the 
method comprising the steps of: 

inputting a text written in the first language; 

extracting said word phrase from said text; 

querying a database for said extracted word phrase using 
a phrase index of said database; said phrase index 
indexing text fragments by word phrases; said text 
fragments representing a primary grammatical unit 
including at least one clause; the database containing 
pairs of text fragments, each pair including a text 
fragment of the first language and a corresponding text 
fragment of the second language; and 

obtaining a translation of said extracted word phrase 
based on one of the pairs of text fragments revealed 
during the step of querying the database. 

2. The method of claim 1, wherein said word phrase is a 
noun phrase. 

3. The method of claim 1, wherein said word phrase is a 
verb phrase. 

4. The method of claim 1, wherein said word phrase is a 
predicate involving at least one verb and one noun or 
adjective used as a noun. 

5. The method of claim 1, wherein the step of obtaining 
a translation includes the step of presenting to a user the 
revealed pairs of text fragments. 

6. The method of claim 1, wherein the step of obtaining 
a translation includes the step of presenting to a user 
proposed translations of the extracted word phrase based on 
the revealed pairs of text fragments. 

7. The method of claim 5, wherein the step of obtaining 
a translation further comprises the step of approving one of 
the revealed translations by the user and using the approved 
translation as translation of the extracted word phrase. 

8. The method of claim 1, wherein said word phrase is 
extracted from a text fragment of said input text, and the step 
of obtaining a translation includes the step of sorting the 
revealed pairs of text fragments according to the number of 
word phrases common with said text fragment of said input 
text. 

9. The method of claim 1, wherein the step of extracting 
a word phrase includes the step of selecting a word of said 
input text and determining the word phrase comprising said 
selected word. 

10. The method of claim 1, wherein said primary gram- 
matical units are sentences. 

11. The method of claim 1, wherein the step of querying 
the database for the extracted phrase includes the step of 
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querying the database for all word phrases partly matching 
the extracted phrase. 

12. The method of claim 1, wherein the step of obtaining 
a translation includes querying a terminology base. 
5 13 . A computer-readable storage medium storing instruc- 
tions for translating a word phrase from a first natural 
language to a second natural language by performing the 
steps of: 

inputting a text written in the first language; 
10 extracting said word phrase from said text, said word 
phrase being a group of two or more associated words; 
querying a database for said extracted word phrase using 
a phrase index of said database; said phrase index 
indexing text fragments by word phrases; said text 
15 fragments representing a primary grammatical unit 
including at least one clause; the database containing 
pairs of text fragments, each pair including a text 
fragment of the first language and a corresponding text 
fragment of the second language; and 
obtaining a translation of said extracted word phrase 
based on one of the pairs of text fragments revealed 
during the step of querying the database. 
14. A system for translating an input text from a natural 
^ source language to a natural target language, the system 
comprising: 

storage means for storing a database containing a plurality 
of pairs of text fragments; said text fragments repre- 
senting a primary grammatical unit including at least 

3Q one clause; each pair including a text fragment in the 
source language and a corresponding text fragment in 
the target language, each text fragment containing at 
least one word phrase, said word phrase being a group 
of two or more associated words; 

35 a phrase extractor for extracting a word phrase from a text 
fragment of said input text; 
database retrieval means for retrieving, from said 
database, pairs of text fragments that contain the 
extracted word phrase, using a phrase index of said 

40 database, said phrase index indexing text fragments by 
word phrases; and 
user interface means for allowing a user to select one of 
said retrieved pairs of text fragments to obtain a trans- 
lation of the extracted word phrase. 

45 15. The system of claim 14, wherein said word phrase is 
a noun phrase. 

16. The system of claim 14, wherein said word phrase is 
a verb phrase. 

17. The system of claim 14, wherein said word phrase is 
50 a predicate involving at least one verb and one noun or 

adjective used as a noun. 

18. The system of claim 14, wherein said user interface 
means is arranged for presenting to a user the retrieved pairs 
of text fragments. 

55 19. The system of claim 14, wherein said user interface 
means is arranged for presenting to a user proposed trans- 
lations of the determined word phrase into the target lan- 
guage based on the retrieved pairs of text fragments. 

20. The system of claim 14, wherein said primary gram- 
60 matical units are sentences. 

21. The system of claim 14, wherein said database 
retrieval means is arranged for retrieving from the database 
pairs of text fragments containing word phrases partially 
matching the extracted word phrase. 

65 22. The system of claim 14, further comprising a termi- 
nology base to which said database retrieval means has 
access. 
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23. A method for generating a text fragment database for 
use in translating a word phrase from a first natural language 
into a second natural language, said word phrase being a 
group of two or more associated words, the method com- 
prising the steps of: 5 

inputting a first document containing a text written in the 
first language; 

inputting a second document containing said text written 

in the second language; 
aligning corresponding text fragments of the first and 

second documents; said text fragments representing a 

primary grammatical unit including at least one clause; 
extracting word phrases from the text fragments of the 

first document; and !5 
generating index information on the extracted word 

phrases and the aligned text fragments holding the 
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word phrases, to generate a phrase index indexing text 
fragments by word phrases. 

24. The method of claim 23, wherein said word phrases 
are noun phrases. 

25. The method of claim 23, wherein said word phrases 
are verb phrases. 

26. The method of claim 23, wherein said word phrases 
are predicates involving at least one verb and one noun or 
adjective used as a noun. 

27. The method of claim 23, wherein said primary gram- 
matical units are sentences. 

28. The method of claim 23, wherein the steps of 
inputting, aligning and extracting are repeated for a plurality 
of paired documents. 

29. The method of claim 23, wherein the contents of said 
plurality of paired documents are of the same field. 
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